Efficient SVDD sampling with approximation guarantees for the decision boundary
نویسندگان
چکیده
Abstract Support Vector Data Description (SVDD) is a popular one-class classifier for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the on which trains decision boundary hopefully equivalent to one obtained full set. According literature, good sample should therefore contain so-called observations that would as support vectors However, non-boundary also are essential fragment contiguous inlier regions poor classification accuracy. Other aspects, such selecting sufficiently representative sample, important well. existing largely overlook them, resulting in In this article, we study how considering these points. Our approach frame an optimization problem, where constraints guarantee indeed approximates original boundary. We then propose RAPID, efficient algorithm solve problem. RAPID require any tuning parameters, easy implement scales large sets. evaluate our real-world synthetic data. evaluation most comprehensive so far. results show outperforms competitors accuracy, size, runtime.
منابع مشابه
Efficient Boundary Tracking Through Sampling
The proposed algorithm for image segmentation is inspired by an algorithm for autonomous environmental boundary tracking. The algorithm relies on a tracker that traverses a boundary between regions in a sinusoidal-like path. Page’s cumulative sum (CUSUM) procedure and other methods are adapted to handle a high level of noise. Applications to large data sets such as hyperspectral, are of particu...
متن کاملRapid Sampling for Visualizations with Ordering Guarantees
Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual properties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our...
متن کاملReeb Space Approximation with Guarantees
The Reeb space, which generalizes the notion of a Reeb graph, is one of the few tools in topological data analysis and visualization suitable for the study of multivariate scientific datasets. First introduced by Edelsbrunner et al. [3], the Reeb space of a multivariate mapping f : X→ R parameterizes the set of components of preimages of points in R. In this paper, we formally prove the converg...
متن کاملImage Segmentation Through Efficient Boundary Sampling
This paper presents a combined geometric and statistical sampling algorithm for image segmentation inspired by a recently proposed algorithm for environmental sampling using autonomous robots [1].
متن کاملMixed Bregman Clustering with Approximation Guarantees
Two recent breakthroughs have dramatically improved the scope and performance of k-means clustering: squared Euclidean seeding for the initialization step, and Bregman clustering for the iterative step. In this paper, we first unite the two frameworks by generalizing the former improvement to Bregman seeding — a biased randomized seeding technique using Bregman divergences — while generalizing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2022
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-022-06149-0